AMENDMENTS TO THE CLAIMS 



The following listing of claims will replace all prior versions, and listings, of claims in the 
application. 

1. (Currently Amended) A method of performing a two-dimensional discrete cosine 
transform (DCT) using a microprocessor having an instruction set that includes single- 
instruction multiple-data (SIMD) floating point instructions, wherein the method comprises: 
receiving a two-dimensional block of integer data having C columns and R rows, 
wherein each of the R rows contains a set of C row data values, wherein the 
block of integer data is indicative of a portion of an image, wherein each of C 
and R is an even integer; and 
for each row, 

loading the entire set of C row data values of the row into a set of C/2 
registers of the microprocessor; 

converting the C row data values into floating point form, wherein each of the 
registers holds two of the floating point row data values , wherein said 
converting is accomplished using a packed integer word to floating- 
point conversion (pi2fw) instruction ; and 

performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations are performed 
using SIMD floating point instructions; 

altering the arrangement of values in the registers; 

performing a second plurality of weighted-rotation operations on the values in 
the registers; 

again altering the arrangement of the values in the registers; 
performing a third plurality of weighted-rotation operations on the values in 
the registers; 

yet again altering the arrangement of the values in the registers; 
performing a fourth plurality of weighted-rotation operations on the values in 
the registers to obtain C intermediate floating point values; and 
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storing the C intermediate floating point values into a next available row of an 
intermediate buffer. 

2. (Cancelled) 

3. (Previously Presented) The method of claim 1, wherein said weighted-rotation operations 
are accomplished using a packed swap doubleword (pswapd) instruction, a packed floating- 
point multiplication (pftnul) instruction and a packed floating-point negative accumulate 
(p^nacc) instruction. 

4. (Cancelled) 

5. (Cancelled) 

6. (Currently Amended) The method of claim I [[5]], further comprising: 

for two columns of the intermediate buffer at a time: 

loading data from the two columns into a plurality of registers of the 
microprocessor so that each of the registers holds one value from a 
first of the two columns and one value from a second of the two 
columns, wherein the one value from the first of the two columns and 
the one value from the second of the two columns are taken from the 
same row of the intermediate buffer; and 
performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations for two colimins 
are performed in parallel using SIMD floating point instructions. 

7. (Previously Presented) The method of claim 6, wherein said weighted-rotation operations 
for two columns at a time are accomplished using a packed floating-point multiplication 
(pfinul) instruction, a packed floating-point subtraction (pfsub) instruction and a packed 
floating-point addition (pfadd) instruction. 
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8. (Original) The method of claim 6, further comprising: 

for two columns at a time, 

as each weighted-rotation operation is done, storing weighted-rotation 
operation results to the intermediate buffer. 

9. (Original) The method of claim 8, further comprising: 

for two columns at a time, 

retrieving weighted-rotation operation results from the intermediate buffer; 
performing a second plurahty of weighted-rotation operations on the retrieved 
values; 

again storing weighted-rotation operation results to the intermediate buffer as 
the weighted-rotation operations of the second plurality are done; 

again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a third plurality of weighted-rotation operations on the retrieved 
values; 

yet again storing weighted-rotation operation results to the intermediate buffer 
as the weighted-rotation operations of the third plurality are done; 

yet again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a fourth plurality of weighted-rotation operations on the retrieved 
values; 

converting the weighted-rotation operation results from the fourth plurality to 
integer results. 



10. (Original) The method of claim 9, further comprising: 

for two columns at a time, writing the integer results to an output buffer. 



11. (Currently Amended) A method of performing a discrete cosine transform (DCT) using a 
microprocessor having an instruction set that includes single-instruction multiple-data 
(SIMD) floating point instructions, wherein the method comprises: 
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receiving a two-dimensional block of integer data having C columns and R rows, 
wherein each of C and R is an even integer, wherein the two-dimensional 
block represents a portion of an image; and 
for two columns at a time, 

loading column data from the two columns into registers of the 
microprocessor so that each of the registers holds one value from a 
first of the two columns and one value from a second of the two 
columns, wherein the one value from the first of the two columns and 
the one value from the second of the two columns are taken from the 
same row of the two-dimensional block; 
converting the column data into floating point form; and 
performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations for the two 
columns are performed in parallel using SIMD floating point 
instructions , wherein said weighted-rotation operations are 
accomplished using a packed floating-point multiplication (pfinul) 
instruction, a packed floating-point subtraction (pfsub) instruction and 
a packed floating-point addition (pfadd) instruction ; 
as each weighted-rotation operation is done, storing weighted-rotation 
operation results to an intermediate buffer. 

12. (Cancelled) 



13. (Cancelled) 



14. (Previously Presented) The method of claim 11, fiirther comprising: 
for two columns at a time, 

retrieving weighted-rotation operation results from the intermediate buffer; 
performing a second plurality of weighted-rotation operations on the retrieved 
values; 
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again storing weighted-rotation operation results to the intermediate buffer as 
the weighted-rotation operations of the second plurahty are done; 

again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a third plurality of weighted-rotation operations on the retrieved 
values; 

yet again storing weighted-rotation operation results to the intermediate buffer 
as the weighted-rotation operations of the third plurality are done; 

yet again retrieving weighted-rotation operation results from the intermediate 
buffer; 

performing a fourth plurality of weighted-rotation operations on the retrieved 
values; 

converting the weighted-rotation operation results from the fourth plurality to 
integer results. 

15. (Original) The method of claim 14, further comprising: 

for two columns at a time, writing the integer results to an output buffer. 

16. (Currently Amended) A computer system comprising: 

a processor having an instruction set that includes single-instruction multiple-data 
(SIMD) floating point instructions; and 

a memory coupled to the processor, wherein the memory stores software instructions 
executable by the processor to implement a two-dimensional discrete cosine 
transform method, the method comprising: receiving a two-dimensional block 
of integer data having C columns and R rows, wherein each of the R rows 
contains a set of C row data values, wherein the block of integer data is 
indicative of a portion of an image, wherein each of C and R is an even 
integer; and 

for each row, 

loading the entire set of C row data values of the row into a set of C/2 
registers of the processor; 
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converting the C row data values into floating point form, wherein each of the 
registers holds two of the floating point row data values , wherein said 
converting is accomplished using a packed integer word to floating- 
point conversion (pi2fw) instruction ; and 

performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations are performed 
using SIMD floating point instructions; 

altering the arrangement of values in the registers; 

performing a second plurality of weighted-rotation operations on the values in 
the registers; 

again altering the arrangement of the values in the registers; 
performing a third plurality of weighted-rotation operations on the values in 
the registers; 

yet again altering the arrangement of the values in the registers; 

performing a fourth plurality of weighted-rotation operations on the values in 

the registers to obtain C intermediate floating point values; and 
storing the C intermediate floating point values into a next available row of an 

intermediate buffer. 

17. (Currently Amended) A carrier medium comprising software instructions executable by a 
microprocessor having an instruction set that includes single-instruction multiple-data 
(SIMD) floating point instructions to implement a method of performing a two-dimensional 
discrete cosine transform (DCT), wherein the method comprises: 

receiving a two-dimensional block of integer data having C columns and R rows, 
wherein each of the R rows contains a set of C row data values, wherein the 
block of integer data is indicative of a portion of an image, wherein each of C 
and R is an even integer; and 
for each row, 

loading the entire set of C row data values of the row into a set of C/2 
registers of the microprocessor; 
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converting the C row data values into floating point form, wherein each of the 
registers holds two of the floating point row data values , wherein said 
converting is accomplished using a packed integer word to floating- 
point conversion (pi2fw) instruction ; and 

performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations are performed 
using SIMD floating point instructions; 

altering the arrangement of values in the registers; 

performing a second plurality of weighted-rotation operations on the values in 
the registers; 

again altering the arrangement of the values in the registers; 
performing a third plurality of weighted-rotation operations on the values in 
the registers; 

yet again altering the arrangement of the values in the registers; and 
performing a fourth plurality of weighted-rotation operations on the values in 

the registers to obtain C intermediate floating point values; and 
storing the C intermediate floating point values into a next available row of an 

intermediate buffer. 



(Currently Amended) A computer system comprising: 

a processor having an instruction set that includes single-instruction multiple-data 
(SIMD) floating point instructions; and 

a memory coupled to the processor, wherein the memory stores software instructions 
executable by the processor to implement the method of receiving a two- 
dimensional block of integer data having C columns and R rows, wherein the 
two-dimensional block of integer data is indicative of a portion of an image; 
and 

for two columns at a time, 

loading column data from the two columns into registers of the processor so 
that each of the registers holds one value from a first of the two 
columns and one value from a second of the two columns, wherein the 
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one value from the first of the two columns and the one value from the 
second of the two columns are taken from the same row of the two- 
dimensional block; 
converting the column data into floating point form; and 
performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations for the two 
columns are performed in parallel using SIMD floating point 
instructions , wherein said weighted-rotation operations are 
accomplished using a packed floating-point multiplication (pfmul) 
instruction, a packed floating-point subtraction (pfsub) instruction and 
a packed floating-point addition (pfadd) instruction ; 
as each weighted-rotation operation is done, storing weighted-rotation 
operation results to an intermediate buffer. 

19. (Currently Amended) A carrier medivmi comprising software instructions executable by a 
microprocessor having an instruction set that includes single-instruction multiple-data 
(SIMD) floating point instructions to implement a method of performing a discrete cosine 
transform (DCT), wherein the method comprises: 

receiving a two-dimensional block of integer data having C columns and R rows, 

wherein the two-dimensional block represents a portion of an image; and 
for two columns at a time, 

loading column data from the two columns into registers of the 
microprocessor so that each of the registers holds one value from a 
first of the two colunms and one value from a second of the two 
columns, wherein the one value from the first of the two colunms and 
the one value from the second of the two columns are taken from the 
same row of the two-dimensional block; 
converting the colimin data into floating point form; and 
performing a plurality of weighted-rotation operations on the values in the 
registers, wherein the weighted-rotation operations for the two 
columns are performed in parallel using SIMD floating point 
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instructions , wherein said weighted-rotation operations are 
accompUshed using a packed floating-point multipUcation (pfinul) 
instruction, a packed floating-point subtraction (pfsub) instruction and 
a packed floating-point addition (pfadd) instruction ; 
as each weighted-rotation operation is done, storing weighted-rotation 
operation resuhs to an intermediate buffer. 

20. (Cancelled) 

21. (Previously Presented) The method of claim 1, wherein C=8 and R=8. 

22. (Previously Presented) The method of claim 1, wherein each of the weighted rotations 
of said plurality, said second plurality, said third plurality and said fourth plurality have a 
computational form given by the expressions: 

Y0 = A*X0 + B*X1, 
Yl =-B*XO + A*Xl, 

wherein A and B are coefficients, XO and XI are inputs to the weighted rotation, YO and Yl 
are results of the weighted rotation. 
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